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Abstract Quantitative helio- and asteroseismology require very precise measure- 
ments of the frequencies, amplitudes, and lifetimes of the global modes of stellar 
oscillation. It is common knowledge that the precision of these measurements depends 
on the total length (T), quality, and completeness of the observations. Except in a 
few simple cases, the effect of gaps in the data on measurement precision is poorly 
understood, in particular in Fourier space where the convolution of the observable 
with the observation window introduces correlations between different frequencies. 
Here we describe and implement a rather general method to retrieve maximum 
likelihood estimates of the oscillation parameters, taking into account the proper 
statistics of the observations. Our fitting method applies in complex Fourier space 
and exploits the phase information. We consider both solar-like stochastic oscillations 
and long-lived harmonic oscillations, plus random noise. Using numerical simulations, 
we demonstrate the existence of cases for which our improved fitting method is less 
biased and has a greater precision than when the frequency correlations are ignored. 
This is especially true of low signal-to-noise solar-like oscillations. For example, we 
discuss a case where the precision on the mode frequency estimate is increased by a 
factor of five, for a duty cycle of 15%. In the case of long-lived sinusoidal oscillations, 
a proper treatment of the frequency correlations does not provide any significant 
improvement; nevertheless we confirm that the mode frequency can be measured 
from gapped data at a much better precision than the 1/T Rayleigh resolution. 

Keywords: Helioseismology, Observations; Oscillations, Solar; Oscillations, Stellar 
1. Introduction 

Solar and stellar oscillations are a powerful tool to probe the interior of stars. In 
this paper we classify stellar oscillations into solar-like or deterministic. Solar-like 
oscillations are stochastically excited by turbulent convection and are present in 
the Sun and other main-sequence, subgiant, and giant stars (see e.g. Bedding and 
Kjeldsen, 2007 and references therein). Deterministic oscillations are seen in classical 
pulsators and have mode lifetimes much longer than any typical observational run; 
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one of the best studied objects in this class is the pre-white dwarf PG1159— 035 
also known as GW Vir (Winget et al., 1991). In practice, observations of solar-like 
or deterministic pulsations always have an additional stochastic component due to 
instrumental, atmospheric, stellar, or photon noise. 

An important aspect of helio- and asteroseismology is the determination of the 
parameters of the global modes of oscillation, especially the mode frequencies. In 
the case of the Sun, it is known (Woodard, 1984) that the measurement precision 
is limited by the stochastic nature of the oscillations (realization noise). Libbrecht 
(1992) and Toutain and Appourchaux (1994) have shown that realization noise is 
expected to scale like 1/vT, where T is the total duration of the observation. A com- 
mon practice is to extract the solar mode parameters from the power spectrum using 
maximum likelihood estimation (MLE, see e.g. Anderson, Duvall, and Jefferies, 1990; 
Schou, 1992; Toutain and Appourchaux, 1994; Appourchaux, Gizon, and Rabello- 
Soares, 1998; Appourchaux et at, 2000). In its current form, however, this method 
of analysis is only valid for uninterrupted time-series. This is a significant limitation 
because gaps in the data are not uncommon (daily cycle, bad weather, technical 
problems) . The gaps complicate the analysis in Fourier space: the convolution of the 
data with the observation window leads to correlations between the different Fourier 
components. The goal of this paper is to extend the Fourier analysis of solar and 
stellar oscillations to time series with gaps, using appropriate maximum likelihood 
estimators based on the correct statistics of the data. 

Section 2 poses the problem of the analysis of gapped time series in Fourier space. 
In Section 3 we derive an expression for the joint probability density function (PDF) 
of the observations, taking into account the frequency correlations. Our answer is 
consistent with an earlier (independent) derivation by Gabriel (1994). Based on 
this PDF, we derive maximum likelihood estimators in Section 4. In Section 5 we 
recall the "old method" of maximum likelihood estimation based on the unjustified 
assumption that frequency bins are statistically independent. Section 6 explains the 
set up of the Monte-Carlo simulations, used to test the fitting methods on artificial 
data sets. In Section 7 we present the results of the Monte-Carlo simulations and 
compare the new and old fitting methods. For the sake of simplicity, we consider 
only one mode of oscillation at a time (solar-like or sinusoidal) . We present several 
cases for which our new fitting method leads to a significant improvement in the 
determination of oscillation parameters, and in particular the mode frequency. 



2. Statement of the Problem 

2.1. The Observed Signal in Fourier Space 

Let us denote by y = {yi} the time series that we wish to analyse. It is sampled at 
times ti = iAt, where i is an integer in the range < i < TV— 1, and At = one minute 
is the sampling time. All quantities with a tilde are defined in the time domain. The 
total duration of the time series is T = NAt. By choice, all of the missing data 
points were assigned the value zero: this enables us to work on a regularly sampled 
time grid. Formally, we write 

yi=WiXi, i = 0, 1, . . . , N - 1. (1) 
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where x is the uninterrupted time series that we would have observed if there had 
been no gaps and w is the window function defined by Wi = 1 if an observation is 
recorded at time ti and Wi = otherwise. The x is drawn from a random process, 
whose statistical properties will be discussed later. 
We define the discrete Fourier transform y of y by 

1 N ~ 1 

& = Jf E * for J' eN ' ( 2 ) 

i=0 

where Uj = jAv is the frequency and Av = 1/NAt. Note that ijj = 2//v-j an( i 
ijj = y_j, where the star denotes the complex conjugate. The Fourier transform has 
periodicity 1/ At or twice the Nyquist frequency. 

Our intention is not to fit the complete Fourier spectrum, but a rather small inter- 
val that contains one (or a few) modes of stellar oscillation. Thus, we extract a section 
of the data of length M starting from a particular frequency v q , as shown in Fig- 
ure 1(c). This subset of the data is represented by the vector y = [yo, yi, — , j/jvf— l] 
with components 

Vi=V q +i, i = 0, 1, ...,M- 1. (3) 

Using the above definition of the Fourier transform, the vector y is given by the 
convolution of x with the window w: 

M+p-l 

Vi= Wi-j Xg+j. (4) 

3=—P 

The integer p in Equation (4) refers to the cutoff frequency v v beyond which the 
observation window has no significant power. Truncating the window function at 
frequency v v is a simplification of the general problem. Our main goal, however, 
is, given a known window function, to study its effects on the determination of the 
parameters of stellar oscillations. Figure 1 is a schematic representation in Fourier 
space of the convolution of a single mode of oscillation by the window function. The 
observed signal is spread over some frequency range and, as we shall see later, its 
statistical properties are affected. 

We note that, in practice, one can never completely isolate one single mode of 
oscillation in the power spectrum. In particular, other modes with frequencies outside 
of the fitting range can leak into it after convolution by the temporal window func- 
tion. Hence, fitting one mode of oscillation is a simplification. But our first objective 
is to try to study the effects of gaps, independently from the complications associated 
with a badly specified model. 

Equation (4) can be rewritten in matrix form as 

y = Wx, (5) 
where the vector x = [xq, X\, . . . , XM+2p-i] T of length M + 2p is defined by 

xi = Xg-p+i, i = 0, 1, . . . ,M + 2p - 1, (6) 
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Figure 1. Schematic representation in Fourier space of the convolution of the signal x with 
the window function w. For the sake of simplicity, only the power spectra of the different 
quantities arc shown here. Panel (a) shows the window function w and its cutoff frequency v p , 
panel (b) the unconvolved signal x, and panel (c) the observed signal y. Note that the selected 
section of the observed signal, starting at frequency u p , is of length M, while the unconvolved 
signal is of length (M + 2p). 



and W = [Wij] is the M x (M + 2p) rectangular window matrix with elements 



,j — where i = 0, 1, . . . , M — 1 and j = 0, 1, . . . , M + 2p — 1: 



W 



WQ 



WQ 



WQ 



(7) 



Note that Wi = w' t L i and that W is of rank M. 

Equation (5) is the master equation. Our goal is to extract the stellar oscillation 
parameters (contained in x), given the uncomplete information y. 
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2.2. Statistics of the Unconvolved Signal 

Here we describe the basic assumptions that we make about the statistics of the 
data in the Fourier domain. The unconvolved signal [a;] consists of a deterministic 
component [d] and a zero-mean stochastic component [e] such that 

x = d + e. (8) 

The deterministic component d may include deterministic stellar oscillations that 
are long-lived compared to the total length of the observation. The stochastic com- 
ponent e may include various sources of noise (e.g. stellar convection, photon noise, 
atmospheric noise, etc.) and stochastically excited pulsations as observed on the Sun. 

We assume that the ej are M + 2p independent random variables in the Fourier 
domain. This is equivalent to saying that the stochastic component of the signal in 
the time domain is stationary. We further assume that e is a Gaussian random vector 
with independent real and imaginary parts and covariance matrix 

E[e*e j ]=o$8 ij , i,j = 0, 1, . . . , M + 2p - 1, (9) 

where E denotes the expectation value and Oi is the standard deviation of e, at 
frequency Vi. One may invoque the central limit theorem to justify the choice of 
Gaussian distributions. The quantity af is the expected power spectrum at frequency 
vi, which may include background noise and peaks corresponding to the modes of 
oscillations (Duvall and Harvey, 1986; Appourchaux, Gizon, and Rabello-Soares, 
1998). In terms of a complex Gaussian random vector g with unit covariance matrix, 
E[g*g T ] = Im+2p, we can rewrite e as 

e = Sg, (10) 

where S is the (M + 2p) x (M + 2p) diagonal matrix 

S = diag(cr ,o"i, . . . ,<7 M+2p _i). (11) 

We emphasize that, although the axe uncorrelated random variables, the yi are 
correlated because of the multiplication of x by the window matrix [Equation (5)]. 



3. Joint PDF of the Complex Fourier Spectrum 

In this section we derive an expression for the joint probability density function of 
the observed signal y. This problem had already been solved by Gabriel (1994). We 
reach the same conclusion, independently and with more compact notations. We 
start by rewriting the master equation, Equation (5), as 

y = Wd+Cg, (12) 

where 

C = WS (13) 
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is a M x (M + 2p) matrix with rank M and singular value decomposition (Horn and 
Johnson, 1985, chapter 7.3) 

C = UZV H . (14) 

Here the superscript H denotes the Hermitian conjugate and U and V are unitary 
matrices of dimensions MxM and (M+2p) x (M+2p) respectively, i.e. U H U = lyi 
and V H V = Im+2p- The M x (M + 2p) matrix £ can be written as 

E = [A|0], A = diag(A ,Ai,...,A A /-i), (15) 

where Ao, Ai, . . . , \m-i are the M (positive) singular values of the matrix C . Thus, 
there exists a vector £ = V H g such that 

y = Wd+U[A\0]£. (16) 

Since g has unit covariance matrix and V is unitary, the vector £ is a complex 
Gaussian random vector of size M + 2p with unit covariance matrix. It is obvious 
from Equation (16) that there exists a lower-rank complex Gaussian random vector 
of length M, 77 = [£ , Ci, ■ ■ ■ , £m-i] T , such that 

y = Wd+UArj. (17) 

The variables £m 5 £m+i> ■ ■ ■ ,?M+2p-i are dummy variables, which do not enter in 
the description of y. Equation (17) is an important step, as the vector y of length 
M is now expressed in terms of M independent complex Gaussian variables. This 
enables us to write the PDF of y as 

Pv(y) = jPr,((UA)-\y-Wd)), (18) 

where Pr](r)) denotes the PDF of r\ and J is the Jacobian of the linear transformation 
r\ — > y. Since 77 is a complex Gaussian random vector with unit covariance, i.e. 
E[rfrj T ] = I M , we have 

= exp(-N| 2 ) ; (19) 

where we used the notation ||r7|| 2 = r] H rj. Since U is unitary and A is diagonal and 
real, the Jacobian of the transformation is given by 

A/-1 

J = |det((7A)| 2 = (detA) 2 = J| A 2 . (20) 

i=0 

Combining Equations (18), (19), and (20), we get the joint PDF of the observed 
vector y: 

exp(-||A- 1 t/ g (y- ^d)|| 2 ) 
r (detA) 2 

The above expression is, perhaps, more elegantly written as 



_ exp(-||Ct (2/ -^)|| 2 ) 
Py{y) - ^/ (detA)2 ^ 
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in terms of C\ the (M + 2p) x M Moore-Penrose generalized inverse of C (Horn 
and Johnson, 1985, chapter 7.3), 

C f = vtfl7 H = C H (CC H y\ (23) 

where E^ is the transpose of E in which the singular values are replaced by their 
inverse. One may ask, after the fact, if the quantity ([/A) -1 in Equation (18) is always 
defined. The answer would appear to be yes since the Moore-Penrose generalised 
inverse of C is perfectly well defined. It is not excluded, however, that some singular 
values Xi could be infinitesimally small. We have not encountered any such difficulty 
with the test cases given in Section 7. Should C be ill-conditioned in other 
simple truncated SVD would help avoiding a numerical problem. 

Before discussing the implementation of the method in Section 4, we should like 
to draw attention to a parallel between fitting data with temporal gaps and fitting 
data with spatial gaps. In order to understand this analogy, we refer the reader to the 
work of Appourchaux, Gizon, and Rabello-Soares (1998, Section 3.3.4) who discuss 
how to interpret the spatial leaks of non-radial oscillations that arise from the fact 
that only half of the solar disk can be observed from Earth. Their approach is similar 
to the one developed in this paper. 

4. Maximum Likelihood Estimation of Stellar Oscillation Parameters 

Let us assume that the stellar oscillation model that we are trying to fit to the data 
depends on a set of k parameters /i = (/io,^i, . . . ,fik-i)- These parameters may 
be the amplitude, the phase, the frequency, the line asymmetry, the noise level, etc. 
The basic idea of maximum likelihood estimation is to pick the estimate /i* that 
maximizes the likelihood function. The likelihood function is another name for the 
joint PDF [Equation (22)] evaluated for the sample data. In practice, one minimizes 

AI-l 

C(n) = - \np y = \\C\y- Wd)\\ 2 + 2 ^ In Aj + constant, (24) 

i=0 

rather than maximizing the likelihood function itself. In the above expression, the 
quantities and Xi all depend implicitly on the model parameters \x through the 
covariance matrix S. The vector d also depends on the model parameters in the case 
deterministic oscillations. The probability of observing the sample data is greatest if 
the unknown parameters are equal to their maximum likelihood estimates /x*: 

/i* = argmin C{fi). (25) 

The method of maximum likelihood has many good properties (Brandt, 1970). In 
particular, in the limit of a large sample size (M large), the maximum likelihood 
estimator is unbiased and has minimum variance. 

What is particularly new about our work is the minimization of the likelihood 
function given by Equation (24). We use the direction set method, or Powell's algo- 
rithm, to solve the minimization problem with a computer. In practice, the result of 
the fit depends on the initial guess and the fractional tolerance of the minimisation 
procedure (the relative decrease of C in one iteration). The dependence of the fitted 
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parameters on the initial guess is due to the fact that the function C may have local 
minima in addition to the global minimum. We will address this issue in more detail 
in Section 7. 

4.1. Special Case: Solar-Like Oscillations 

In the case of solar-like oscillations, there is no deterministic component and the 
log-likelihood becomes 

M-i 

£( M ) = ||C f ?/|| 2 + 2 ^2 In Aj + constant. (26) 

4.2. Special Case: Deterministic Oscillations plus White Noise 
If background white noise is the only stochastic component then 

&i = era = constant, i = 0, 1, . . . , M + 2p — 1. (27) 
The log-likelihood function becomes 

C(fi) = \\\w\y- Wd)\\ 2 + M lncro + constant. (28) 

Splitting the unknowns /i = (fi,ao) into the parameters describing the oscillations, 
fi = (fio,fJ-i, ■ ■ ■ , Hk-2) and the noise level ao, the minimization problem reduces to 
finding the most likely estimates 

A* = argmin \\W\y - Wd)\\ 2 , (29) 
A 

where d = d(fi) . The noise level is explicitly given by 

<T * = M- 1/2 \\W\y - Wd(^)}\\. (30) 

5. The Old Way: Fitting the Power Spectrum Ignoring the Correlations 

Maximum likelihood estimation has been used in the past to infer solar and stellar 
oscillation parameters, even in the case of gapped time series. The joint PDF of the 
observations was assumed to be the product of the PDFs of the individual j/i, as if 
the frequency bins were uncorrelated. For comparison purposes, we briefly review 
this (unjustified) approach. 

According to Equation (12), the PDF of m is a normal distribution 

pM = TE2t^vlM (3i) 

TTVi 

with mean 

M+2p~l 

Vi= Wijdj (32) 

j=o 
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and variance 

M+2p-l 

«i= E l^'lM- (33) 

Under the (wrong) assumption that the yi are independent random variables, the 
joint PDF of y becomes 

M-l 

Py C (v)= II (3 4 ) 

i=0 

where the superscript "nc" stands for "no correlation" . This joint PDF uses the 
correct mean (y^) and variance (vj) of the data, but it ignores all the non- vanishing 
cross-terms E[y* yj]. In other words, the spread of power implied by the convolution 
with the window is taken care of, but not the proper statistics. 

Under the same simplifying "no correlation" assumption, the log-likelihood func- 
tion is 

M-l _ |2 M-l 
£. nc (n) = + Invi + constant, (35) 

i=0 % i=0 

where the y i and Vi are implicit functions of the model parameters \i. 
5.1. Special Case: Solar-Like Oscillations 

If the signal has no deterministic component (d = 0), then the power spectrum 
[Pi (ft) = \yi\ 2 ] has the expectation value Pi = E[P{] = v\. Thus, in the case of purely 
solar-like oscillations, we recover the standard expression (Toutain and Appourchaux, 
1994): 

jC nc (fi) = [ =~ + hi-P; J + constant when d = 0. (36) 



i=0 



While the above expression is perfectly valid for uninterrupted data, it is not justified 
when gaps are present. The parameters /x" c that minimize jC nc (fi) are not optimal, 
as will be shown later using Monte-Carlo simulations. 

5.2. Special Case: Deterministic Oscillations plus White Noise 

When Gi = o"o = constant, the "no-correlation" log-likelihood function simplifies to 
C nc (ii) = \ ^ P ~ W . d }\ + M In + constant. (37) 

a o E5=- P Kr 

The minimization problem becomes 

2 



fiT = argmin \\y-Wd\\\ (38) 
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where d(fi) depends on the oscillation parameters jl = (no, /xi, . . . , fik—2)- The noise 
level is explicitly given by 



6. Simulation of Artificial Time Series 

So far we have considered a general signal which includes a deterministic component 
and a stochastic component. The parametrisation of each component depends on 
prior knowledge about the physics of the stellar oscillations. Solar-like pulsations are 
stochastic in nature and no deterministic component is needed in this case. On the 
other hand, long-lived stellar pulsations are treated as deterministic. Some stars may 
support both deterministic and stochastic oscillations. In this section, we model the 
two cases separately. 

We want to test the fitting method [Equations (24) and (25)] by applying it to 
simulated time series with gaps. For comparison, we also want to apply the old fitting 
method (Section 5) to the same time series. We need to generate many realizations 
of the same random process in order to test the estimators for bias and precision: 
this is called Monte-Carlo simulation. In Section 6.1 we discuss the generation of 
the synthetic window functions. We then discuss the parametrisation of the solar- 
like oscillations (Section 6.2) and the deterministic oscillations (Section 6.3) used to 
simulate the unconvolved signal. 

6.1. Synthetic Window Functions 

We generate three different observation windows, corresponding to different duty cy- 
cles. The observation windows are first constructed in the time domain. By definition, 
Wi is set to one if an observation is available and zero otherwise. The total length 
of all time series is fixed at T = 16.5 days (frequency resolution Av = 0.7 fiRz). 
A window function is characterized by two main properties: the duty cycle (fraction 
of ones) and the average periodicity. A typical window function for a single ground- 
based site has a 24-hour periodicity. In order to deviate slightly from purely periodic 
window functions we introduce some randomness for the end time of each observation 
block. 

Figure 2(b) - (d) shows the power spectra of the three window functions. The first 
and second window functions have a main periodicity of 24 hours and duty cycles 
of 66% and 30% respectively. Two side lobes occur at frequencies 11.6 fiRz and 
23.1 fxRz. The non- vanishing power between the side lobes is due to the deviation 
from a purely periodic window. The third window function has a main periodicity 
of 48 hours and a duty cycle of only 15%. All of these window functions are not 
unrealistic. 

We apply a sharp low-pass filter at frequency v p = 34.3 fiKz (p = 49) to all 
window functions. The power at higher frequencies corresponds to about 5% of the 
total power in the windows. This truncation is needed to apply the fitting algorithm, 
which assumes that there exists a frequency v v beyond which the power in the 
window vanishes, i.e. that the window function is band limited. 




(39) 
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Figure 2 . Square root of the power spectra of the synthetic window functions [w] used in this 
paper. From top to bottom, the duty cycle is (a) 100%, (b) 66%, (c) 30%, and (d) 15%. The 
main periodicity of the window is 24 hours for cases (b) and (c), and 48 hours for window (d). 
All windows are truncated at frequency v v = 34.3 ^Hz. 



6.2. Modeling Solar-Like Oscillations 

We generate the realizations of the unconvolved solar-like oscillation signal directly 
in the Fourier domain. We consider a purely stochastic signal (d = 0) and a single 
mode of oscillation. Since we assumed stationarity in the time domain, the Fourier 
spectrum of the unconvolved signal for one single mode can be written as 

Xi = e t = [SL{ Vl )+M] 1/2 m , i = 0,l,...,M + 2p-l, (40) 
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where L describes the line profile of the mode in the power spectrum, S is the mode's 
maximum power, J\f is the variance of the background noise, and rji is a centered 
complex Gaussian random variable with unit variance and independent real and 
imaginary parts. Solar-like oscillations are stochastically excited and intrinsically 
damped by turbulent convection (Goldreich and Keeley, f977; Stein et at, 2004). 
The expectation value of the power spectrum is nearly Lorentzian, except for some 
line asymmetry (e.g., Duvall et al, f993). Here we use a simple asymmetric line 
profile: 

T , , (l + bX) 2 + b 2 . , „, v-uo 

L(v) = ^— '—Z. — w ith X = t-Z, (41) 

\ ) 1 + X 2 y/2 ' y ' 

where v$ is the resonant frequency, b is the asymmetry parameter of the line profile 
(|6| <C 1), and T is a measure of the width of the line profile. We refer to S/J\f as 
the signal-to-noise ratio in the power spectrum. As b tends to zero, T becomes the 
full width at half maximum (FWHM) of the power spectrum and l/(irT) the mode 
lifetime. There are five model parameters, fi = (Vo, T, 6, 5, AT). 

Once the unconvolved signal x has been generated in the Fourier domain, the 
observed signal y is obtained by multiplication with the window matrix W, as 
explained above. 

6.3. Modeling Deterministic Sinusoidal Oscillations plus White Noise 

In the time domain, we consider a purely sinusoidal function on top of white back- 
ground noise: 

Xi = A sin (2ttvq U + ip) + a t m i = 0, 1, . . . , N - 1. (42) 

The first term describes the deterministic component of the signal, where A is the 
amplitude, the mode frequency, and ip the phase of the mode. The second term 
is stochastic noise with standard deviation at- The r\i are N normally distributed 
independent real random variables with zero mean and unit variance. The observed 
signal is obtained by multiplying Xi by the window Wi in the time domain. The 
model parameters are [i = (uq, tp, A, at). 

We have defined the signal and the noise in the time domain, but a definition of 
signal-to-noise ratio in the Fourier domain is desirable. On the one hand, the variance 
of the noise in the Fourier domain is 

P 2 P 

i—~p i——p 

where ^\ \wi\ 2 is the total power in the window. On the other hand, the maximum 
power of the signal in Fourier Sp£lC6 is -/"max 

= yl 2 |i&o| 2 /4, where |u)o| 2 is the power 
of the window at zero frequency. Thus, by analogy with the solar-like case, it makes 
sense to define the signal to noise ratio in the Fourier domain as 

^=^4 m 



IE, 



't 



In practice we fix A and S/Af and deduce the corresponding noise level at- 
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7. Testing the Fitting Methods 

Several hundreds of realizations are needed in order to assess the quality of a fitting 
method. We do not intend to test all possible combinations of mode parameters but 
we want to show a few cases for which the new fitting method provides a significant 
improvement compared to the old fitting method. 

7.1. Solar-Like Oscillations: Window Function with 30% Duty Cycle 

Figure 3 shows one realization of a simulated mode of solar-like oscillation with input 
parameters u a = 3000 /iHz, T = 3.2 //Hz, S = 0.9, M = 0.15, and 6 = 0.1. The signal 
to noise ratio is S/Af = 6 and the window function is 30% full (see Figure 2(c)). The 
mode lifetime is l/(-7rr) = 27.6 hours. Figure 3(a) displays the real and imaginary 
parts of the Fourier transform y, together with the standard deviation of the data 
(nc fit in blue, new fit in red, expectation value in green). Figure 3(b) shows the 
power spectrum and the fits. Notice the sidelobes introduced by the convolution of 
the signal with the window functions. The "no-correlation" fit is done on the power 
spectrum [Equation (36)], while the new fit is performed in complex Fourier space 
[Equation (24)]. 

Each fit shown in Figure 3 corresponds in fact to the best fit out of five fits 
with different initial guesses. For each realization, we use the frequency guesses 
3000 + (0, ±5.5, ±11.9) /iHz for vq. The last two frequency guesses correspond to 
the frequencies of the two main sidelobes of the window function (Figure 2c). For 
the other parameters, we choose random guesses within ±20% of the input values. 
The reason for using several guesses is to ensure that the fit converges to the global 
maximum of the likelihood, not to a nearby local maximum, i.e. that the estimates 
returned by the code are the MLE estimates defined by Equation (25). In some cases, 
the global maximum coincides with a sidelobe at ±11.9 /xHz from the main peak. We 
note that the new fitting method requires a much longer computing time than the 
old nc method: typically, three hours on a single CPU core for a single realization 
(five guesses, five fits). 

For the particular realization of Figure 3, the new fit is closer to the expectation 
value (i.e. is closer to the answer) than the old nc fit. No conclusions should be 
drawn, however, from looking at a single realization. 

In order to test the reliability of each fitting method, we computed a total of 750 
realizations with the same input parameters as in Figure 3 and the same window 
function (30% full) . The quality (bias and precision) of the estimators can be studied 
from the distributions of the inferred parameters. As shown by the distributions of 
Figure 4 the new fitting method is superior to the old nc method. This is true for 
all the parameters, in particular the mode frequency i/q. The distributions for the 
mode frequency (Figure 4(a)) are quite symmetric and Gaussian-like, although the 
old fitting method leads to a significant excess of values beyond the two-er mark. We 
note that, in general, the old fitting method is more sensitive to the initial frequency 
guess. Also the estimates of the linewidth T and the mode power S are significantly 
more biased with the old fitting method than with the new one (Figures 4b, 4c). 
It is worth noting that the fits return a number of small T/large S estimates away 
from the main peaks of the distributions, less so for the new fits. These values 
correspond to instances when the signal barely comes out of the noise background. 
The new fit returns the noise level [Af] with a higher precision and a lower number 
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Figure 3. Example of a realization of one mode of a solar-like oscillation (black line) with 
input frequency uo = 3000 /iHz, lincwidth V = 3.2 /xHz, and S/Af = 6. The window function is 
30% full. Panel (a) shows the real and imaginary parts of the Fourier spectrum. Panel (b) shows 
the power spectrum. The vertical dashed lines represent the width of the window function. 
Also shown are the new fit (red), the old fit (blue), and the expectation value (green). 



of underestimated outliers than the old method (the ouliers are represented by the 
vertical bars in Figure 4(d)). Although the estimation of the asymmetry parameter 
is unbiased with the new fitting method (Figure 4(e)), the uncertainty on b is so 
large that it probably could have been ignored in the model. 

Quantitative estimates of the mean and the dispersion of the estimators are 
provided in Table 1. Because the distributions of the estimated parameters are not 
always Gaussian and may contain several outliers, we compute the median (instead 
of the mean) and the lower and upper bounds corresponding to ±34% of the points 
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Figure 4. Distributions of the inferred oscillation parameters from fits to 750 realizations of a 
single mode of solar-like oscillation. The input parameters are given in Table 1 and the window 
function is 30% full. The five panels show the distributions of the inferred (a) mode frequency 
vq, (b) linewidth T, (c) mode power S, (d) noise level Af, and (e) asymmetry parameter b. The 
black lines show the results obtained with the new fitting method and the grey lines show the 
old "no-correlation" fits. The vertical dashed line in each plot indicates the input value. The 
horizontal lines in panel (a) are intervals containing 68% of the fits for the new (black line) 
and the old (grey line) fitting methods. The thick black and grey vertical lines in panel (d) 
give the numbers of outliers with In TV < —5. 

on each side of the median (instead of the one-cr dispersion). This definition has the 
advantage of being robust with respect to the outliers. The notation 3000.0^ /iHz 
in the first row of Table 1 means that the median mode frequency is 3000.0 /iHz and 
that 68% of the fits belong to the interval [3000.0 -a, 3000.0 + b] ^tHz. We emphasize 
that the subscript —a and the superscript +b do not refer to an uncertainty in the 
determination of the median: the median is known to a much higher precision thanks 
to the large number of realizations. Later we relax the language and refer to the 
"one-cr uncertainty" to mean the average a = (a + 6)/2. 
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Table 1. Medians and scatters of the distributions of the es- 
timated parameters of solar-like oscillation (see Figure 4). The 
window function is 30% full, the input linewidth is 3.2 fiHz, 
and the input signal-to-noise ratio is S/Af = 6. The new and 
old MLE estimates are given in the last two columns. By 
definition, 68% of the fits fall within the bounds set by the 
subscripts/superscripts (the notation is explained in detail in 
the text). 



Mode parameter Input value New fitting Old fitting 





3000.0 


3000.0+^* 


3000.0+2 s 


ln(r[,uHz]) 
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0.8+- 


2+ 1 1 
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Table 2. Medians and scatters of the mode frequency estimates (solar-like 
oscillations) for the window functions defined in Section 6.1. The input 
mode frequency is vq = 3000 ^Hz, the input linewidth is Y = 3.2 /xHz, 
and the signal-to-noise ratio is fixed at SjM = 6. The mode lifetime is 
27.6 hours. 



Window function Frequency estimate [^Hz] 

Duty cycle Main period Average gap New fitting Old fitting 
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The numbers from the last two columns in Table 1 confirm the analysis of Figure 4. 
The mode frequency can be measured with a precision of 1.4 /xHz, which is exactly 
twice better with the new fitting method than with the old one. This gain in precision 
is very significant and potentially important. Since measurement uncertainty scales 
like T -1 / 2 (Libbrecht, 1992), one may equate the gain in using the proper fitting 
procedure to an effective increase in the total length of the time series by a factor 
of four. As seen in Table 1, the linewidth, the mode power, the background noise, 
and the line asymmetry parameter are all less biased and more precise with the new 
fitting method than the old one. Notice that the larger dispersions in the old-fit case 
are due in part to non-Gaussian distributions with extended tails. 
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Figure 5. Distributions of the mode frequency and the linewidth for 750 realizations of 
solar-like oscillations, using the old fitting method (panels a and b) and the new fitting method 
(panels c and d). The observation windows have a duty cycle of 15%, 30%, 66%, and 100%. 
The vertical dashed lines represent the input values. The input linewidth is V = 3.2 /xHz 



7.2. Solar-Like Oscillations: Different Window Functions 



Here we study how bias and precision change as the window function changes, in 
particular as the duty cycle changes. We consider the four window functions defined 
in section 6.1 with duty cycles [a] equal to 15%, 30%, 66%, and 100%. First we 
consider input parameters of solar-like oscillations that are exactly the same as in 
the previous section: i/q = 3000 /iHz, T = 3.2 /xHz, S = 0.9, S/Af = 6, and b = 0.1. 
Figure 5 shows the distributions of the inferred mode frequencies and linewidths, 
using the old (Figures 5a, 5b) and the new (Figures 5(c), 5(d)) fitting methods. Each 
fit is the best fit from five different vq guesses (see Section 7.1). The distributions 
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Figure 6. Uncertainty on estimates of the mode frequency [uq] as a function of the window 
duty cycle [a]. The window functions are as defined in Section 6.1. The red curve shows the 
1-cr Monte-Carlo MLE uncertainties for the new fitting method. The black curve shows the 
1-tr Monte-Carlo MLE uncertainties for the old no-correlation fitting method. The blue curves 
show the mean Cramer- Rao lower bounds (formal error bars). The square symbol with a cross 
at q = 30% in the left panel is a rough estimate (see text). In the left panel, the input 
lincwidth is T = 3.2 /xHz (see also numbers in Table 1). In the right panel, the input linewidth 
is r = 10 iiRz, all of the other parameters being the same as in the left panel. In both panels 
the signal-to-noise ratio is S/Af = 6. For reference, the dashed lines have slope a~ 1 ' 2 . 



for the 100%-window are identical for the two fitting methods; this is expected since 
the old and new fitting methods are equivalent in the absence of gaps. 

The precision on vq using the old "no-correlation" MLE drops fast as the duty 
cycle decreases (Figure 5(a)). This drop is much faster than in the case of the fits 
that take the frequency correlations into account (Figure 5(c)). When the duty cycle 
is 15%, the frequency estimate is five times better with the new than the old method. 
The difference is perhaps even more obvious for the linewidth. For the 15% window, 
it is almost impossible to retrieve T with the old fitting method (Figure 5b), while 
the new method gives estimates that are almost as precise as in the no-gap case 
(Figure 5d). The estimates of T are significantly less biased with the new method. 
Figure 5 confirms the importance of using the correct expression for the likelihood 
function. 

Table 2 gives the medians and half-widths of the i/n distributions. The one-er 
dispersions are plotted as a function of the duty cycle a in the left panel of Figure 6. 
The improvement in the fits is quite spectacular. For example, when a = 15% the 
dispersion on vq is five times less with the new fitting method (1.5 /iHz vs. 7.4 fiHz). 

1/9 

With the old method, the uncertainty on uq increases much faster than a 1 
as the duty cycle a drops (~ aT 1 between the 30% and 15% windows). This 
steep dependence on a is worse than "predicted" by Libbrecht (1992). In his pa- 
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per, Libbrecht suggested to use the uncertainty a va = fY /(AirT) where f((3) = 
(1 + /3) 1/2 [(1 + /3) 1/2 + P 1/2 f and (3 is an "effective" noise-to-signal ratio. He 
suggested that the main effect of the gaps is to increase the noise-to-signal ratio 
Af/S, presumably by a factor J^, |u)i| 2 /|u>o| 2 , itself proportional to 1/a. This leads 

1/2 

however to a dependence of ov on a which, in our particular case, is closer to a ' 
than a~ . We suspect that the Libbrecht formula underestimates the dispersion 
because it ignores the frequency correlations. 

The new fitting method returns a vq uncertainty that is much less sensitive to the 
duty cycle, with a variation like ~ a~ 015 (red curve, left panel of Figure 6). This is 
quite remarkable. That the frequency uncertainty could remain nearly constant for 
a > 30% is not really surprising since the average gap (see numbers in Table 2) is 
less than the mode lifetime r = l/(-7rr) = 27.6 hours. This regime was studied by 
Fossat et al. (1999) using a gap-filling method: as long as the signal-to-noise ratio is 
large enough, the signal can be reconstructed. Why the new fit is doing such a good 
job for duty cycles a < 30% is, however, puzzling (at first sight), since the average 
gap (40.7 hours) is larger than the mode lifetime. This can be understood as follows. 
For small duty cycles, the time-series is effectively a collection of nearly independent 
blocks of data, which, for the 30% window function, are eight-hour long on average. 
Since MLE simulations tell us that the uncertainty on the mode frequency for an 
uninterrupted series of eight hours is about 5.5 fiRz, we would expect for the gapped 
time series (T = 16.5 days, 24-hour periodicity) to be able to reach the uncertainty 
5.5/-\/l6 = 1.375 fiYLz. This value, represented by the box with a cross in Figure 6, is 
found to be very close to the MLE estimate from the new fits. Hence, what matters 
at very low duty cycle is the number of independent blocks of continuous data. The 
new fitting method captures this very well, which is satisfying. By comparison, the 
old no-correlation fitting method does poorly (black line). 

In order to further investigate this last point, we ran another set of simulations 
using a mode linewidth Y = 10 /iHz corresponding to a mode lifetime r = 8.8 hours, 
which is significantly smaller than the average gap lengths of the 30% and the 15% 
windows. The other input parameters remained the same as above. We computed and 
fitted 1350 realizations. The results are shown in the right panel of Figure 6. For the 
new fitting method, the dependence of the frequency uncertainty on the duty cycle 
is about a~ ' 12 , which is comparable to the previous simulations with Y = 3.2 /iHz. 
We conclude that it is really worth solving for the correct minimization problem and 
that fitting for the phase information in complex Fourier space is important to get 
a good match between the model and the data. Of course, this can only be done 
properly when we have a perfect knowledge of the model, which is the case with 
these numerical simulations, but is rarely the case with real observations. 

7.3. Solar-Like Oscillations: Cramer-Rao Lower Bounds 

Monte-Carlo simulations are very useful in order to assess the variance and the bias 
of a particular estimator. When fitting real observations, however, the variance of the 
estimator cannot be computed directly by Monte-Carlo simulation since the input 
parameters are, by definition, not known. Hopefully, the fit can return a formal error 
from the shape of the likelihood function in the neighborhood of the global maximum. 

The Cramer-Rao lower bound (Kendall and Stuart, 1967) achieves minimum vari- 
ance among unbiased estimators. It is obtained by expanding C about its minimum. 
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Figure 7. Distributions of formal errors on the mode frequency obtained by inverting the 
Hessian. The left panel is for the simulation with Y = 3.2 /jHz (see Figure 5c) and the right 
panel is for T = 10 /xHz. The different curves correspond to different window functions, as 
indicated in the legend. The means of these distributions (Cramer-Rao lower bounds) give the 
blue curves plotted in Figure 6. 



The formal error a fli on the parameter \Xi is given by 

o-w = V^ti, (45) 

where Ka is the ith element on the diagonal of the inverse [K = H- 1 } of the Hessian 
matrix with elements 

Hij = for i,j=0,l,...,fc-l. (46) 

The Cramer- Rao formal errors have been used in helioseismology by, for example, 
Toutain and Appourchaux (1994), Appourchaux, Gizon, and Rabello-Soares (1998), 
and Gizon and Solanki (2003). 

We have computed the formal error on the mode frequency for many realizations 
and for all window functions. The resulting distributions are shown in Figure 7. The 
mean formal error from each distribution is plotted in Figure 6. Overall the Cramer- 
Rao lower bound is remarkably close to the Monte-Carlo MLE uncertainty using the 
new fitting method; they are even undistinguishable when T = 10 /iHz. 

This is useful information as it means that, on average, the Hessian method pro- 
vides reasonable error estimates. It should be clear, however, that the distributions 
shown in Figure 7 show a significant amount of scatter: the formal error from the 
Hessian may be misleading for particular realizations. 
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Figure 8. Real and imaginary part of the Fourier transform of a simulated gapped time series 
containing a sinusoid on top of white background noise. The signal-to- noise ratio is S/Af = 100. 
The observation window has a duty cycle of 30%. The simulated data is the thick grey line. 
The thin black line shows the fit to the data using the new fitting method. The fit with the 
old method is not shown since it is almost identical. 



7.4. Sinusoidal Deterministic Oscillation plus White Noise 



Figure 8 shows the Fourier spectrum of a simulated time series containing a sinusoidal 
mode of oscillation on top of a white noise background as described in Section 6.3. 
In this particular case the observation window with a duty cycle of 30% is used (see 
Figure 2(c)). The input parameters of the sinusoidal function are the mode frequency 

= 3000 /iHz, the amplitude A = 1.1, and the phase ip = 60°. The signal-to-noise 
ratio is S/M = 100. The fit shown in Figure 8 was obtained with the new fitting 
method. Since we found no significant difference between the old and the new fitting 
methods in this case, the old fitting method is not shown. Differences between the 
data and the fit are essentially due to the noise. 

We computed 500 realizations of sinusoidal oscillations with the same mode pa- 
rameters (frequency, amplitude, phase) as above, the same observation window (30% 
full), but with a signal-to-noise ratio S/M = 46. The resulting distributions of the 
inferred parameters obtained with the two fitting methods are shown in Figure 9. For 
this simulation, the known input values were used as an initial guess to speed up the 
minimization; we checked on several realizations that it is acceptable to do so when 
the signal to noise ratio is large. The distributions of the inferred parameters (Figure 
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Figure 9. Distributions of the inferred oscillation parameters for a set of 500 realizations of 
long-lived sinusoidal oscillations with S/M = 46. The window function with a duty cycle of 
30% is used. The black and the grey lines are for the new and old fitting methods respectively. 
The vertical dashed line in each plot indicates the input value. The parameters shown are (a) 
the mode frequency [vq\, (b) the logarithm of the mode amplitude [In A], (c) the phase of the 
oscillation [<f>], and (d) the logarithm of the noise level [Intro] (see Section 6.3). Notice that the 
estimate of the noise is biased when frequency correlations are ignored (old nc fit), although 
by a very small amount. 



9) show that for sinusoidal oscillations, the new fitting method does not provide any 
significant improvement compared to the old fitting method. 

We emphasize that the fitting parameters can be determined with a very high 
precision when the noise level is small. In particular, we confirm that the uncer- 
tainty of the frequency estimator can be much smaller than l/T (see Figure 9(a)). 
Figure 10 shows the median and the standard deviation of the mode frequency for 
different signal-to-noise ratios. Each symbol and its error bar in Figure 10 is based 
on the computation of 500 realizations of sinusoidal oscillations with the same mode 
parameters as above, the same observation window (30% full), but various signal-to- 
noise ratios. Since we did not find any significant difference between the two fitting 
methods, only the results obtained with the new fitting method are shown. Figure 
10 illustrates that even for a relatively low signal-to-noise ratio of S/M = 10, the 
standard deviation of the inferred mode frequency is smaller than l/T by a factor 
of four. For higher signal-to-noise ratios the precision is even more impressive: when 
S/M = 100, the standard deviation of the mode frequency is about 20 times smaller 
than l/T. 

The theoretical value of the standard deviation of the mode frequency obtained by 
Cuypers (1987) can be extended to the case of gapped data (Cuypers, 2008, private 
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Figure 10. Median (cross) and standard deviation (vertical bar) of the inferred frequency 
of sinusoidal oscillation [vq\ as a function of signal-to-noise ratio S/N. The duty cycle is 
30%. Only the results obtained with the new fitting method are shown. The horizontal grey 
line shows the input mode frequency. The dashed grey lines show the theoretical value of 
frequency uncertainty, a vo , given by Equation (47) . The vertical axis of the plot spans the 
interval Av = 1/T = 0.7 fiHz. 



communication) as follows: 

\/6 at /.-a 
a VQ = Arr y= , (47) 

where A is the amplitude of the sinusoid in the time domain, at is the rms value of the 
noise, n = aN is the number of recorded data points, and T is the total observation 
length. This theoretical uncertainty is overplotted in Figure 10. The match with 
our Monte-carlo measurements is excellent. This confirms that, in this case, it is 
equivalent to perform the fits in the temporal and in the Fourier domains. Note that 
Equation (47) is only valid under the assumption that the noise is uncorrelated in 
the time domain, a condition fulfilled by our simulations. The main reason why the 
measurement precision is only limited by the noise-to-signal ratio is because perfect 
knowledge of the model is assumed. 



8. Conclusion 



In this paper we derived an expression for the joint PDF of solar or stellar oscillations 
in complex Fourier space, in agreement with the work of Gabriel (1994). This joint 
PDF explicitly takes into account frequency correlations introduced by the convolu- 
tion with the window function. We implemented a maximum likelihood estimation 
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method to retrieve the parameters of stellar oscillations. Both stochastic solar-like 
oscillations and deterministic sinusoidal oscillations were considered. 

In the case of solar-like oscillations, we performed Monte-Carlo simulations to 
show that the improvement provided by our fitting method can be very significant 
in comparison with a fitting method that ignores the frequency correlations. The 
results are summarized in Figure 6. In one particular example, using an observation 
window with a duty cycle a = 30% and a signal-to-noise ratio S/M = 6, the new 
fitting method increased the precision of the mode frequency by a factor of two and 
the estimates of the linewidth and mode power were less biased and more precise. 
For a window with a duty cycle a = 15%, the precision on the mode frequency 
estimate was increased by a factor of five. We also found that the Cramer-Rao lower 
bounds (formal errors) can provide reasonable estimates of the uncertainty on the 
MLE estimates of the oscillation parameters. 

In the case of long-lived, purely sinusoidal oscillations, we did not find any signifi- 
cant improvement in using this new fitting method. Yet, we confirm that the standard 
deviation of the mode frequency can be measured in Fourier space with a precision 
much better than 1 /T for large signal-to-noise ratios, in accordance with a previous 
time-domain calculation (Cuypers, 1987; Cuypers, 2008, private communication). 

The analysis of time series containing many gaps can benefit from our work. 
Applications may include, for example, the re-analysis of solar oscillations from the 
early days of the BiSON network (Miller et al, 2004) or the solar-like oscillations of 
a Centauri observed from the ground with two telescopes (Butler et al, 2004). 
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